diagnosis code
- North America > United States (0.28)
- North America > Canada > Quebec > Montreal (0.04)
LTR-ICD: A Learning-to-Rank Approach for Automatic ICD Coding
Mansoori, Mohammad, Soliman, Amira, Etminani, Farzaneh
Clinical notes contain unstructured text provided by clinicians during patient encounters. These notes are usually accompanied by a sequence of diagnostic codes following the International Classification of Diseases (ICD). Correctly assigning and ordering ICD codes are essential for medical diagnosis and reimbursement. However, automating this task remains challenging. State-of-the-art methods treated this problem as a classification task, leading to ignoring the order of ICD codes that is essential for different purposes. In this work, as a first attempt, we approach this task from a retrieval system perspective to consider the order of codes, thus formulating this problem as a classification and ranking task. Our results and analysis show that the proposed framework has a superior ability to identify high-priority codes compared to other methods. For instance, our model accuracy in correctly ranking primary diagnosis codes is 47%, compared to 20% for the state-of-the-art classifier. Additionally, in terms of classification metrics, the proposed model achieves a micro- and macro-F1 scores of 0.6065 and 0.2904, respectively, surpassing the previous best model with scores of 0.597 and 0.2660.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
ASCENDgpt: A Phenotype-Aware Transformer Model for Cardiovascular Risk Prediction from Electronic Health Records
Sainsbury, Chris, Karwath, Andreas
We present ASCENDgpt, a transformer-based model specifically designed for cardiovascular risk prediction from longitudinal electronic health records (EHRs). Our approach introduces a novel phenotype-aware tokenization scheme that maps 47,155 raw ICD codes to 176 clinically meaningful phenotype tokens, achieving 99.6\% consolidation of diagnosis codes while preserving semantic information. This phenotype mapping contributes to a total vocabulary of 10,442 tokens - a 77.9\% reduction when compared with using raw ICD codes directly. We pretrain ASCENDgpt on sequences derived from 19402 unique individuals using a masked language modeling objective, then fine-tune for time-to-event prediction of five cardiovascular outcomes: myocardial infarction (MI), stroke, major adverse cardiovascular events (MACE), cardiovascular death, and all-cause mortality. Our model achieves excellent discrimination on the held-out test set with an average C-index of 0.816, demonstrating strong performance across all outcomes (MI: 0.792, stroke: 0.824, MACE: 0.800, cardiovascular death: 0.842, all-cause mortality: 0.824). The phenotype-based approach enables clinically interpretable predictions while maintaining computational efficiency. Our work demonstrates the effectiveness of domain-specific tokenization and pretraining for EHR-based risk prediction tasks.
Early Detection of Pancreatic Cancer Using Multimodal Learning on Electronic Health Records
Aouad, Mosbah, Choudhary, Anirudh, Farooq, Awais, Nevers, Steven, Demirkhanyan, Lusine, Harris, Bhrandon, Pappu, Suguna, Gondi, Christopher, Iyer, Ravishankar
Pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest c ancers, and early detection remains a major clinical challenge due to the absence of spec ific symptoms and reliable biomarkers. In this work, we propose a new multimodal appro ach that integrates longitudinal diagnosis code histories and routinely collected laborato ry measurements from electronic health records to detect PDAC up to one year prior to clin ical diagnosis. Our method combines neural controlled differential equations to model irregular lab time series, pretrained language models and recurrent networks to learn diagnosis code trajectory representations, and cross-attention mechanisms to capture in teractions between the two modalities. We develop and evaluate our approach on a real-world dat aset of nearly 4,700 patients and achieve significant improvements in AUC ranging from 6.5 % to 15.5% over state-of-the-art methods. Furthermore, our model identifies diagnosis codes and laboratory panels associated with elevated PDAC risk, including both established and new biomarkers.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Health Care Technology > Medical Record (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.93)
- Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.72)
A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation
Amrollahi, Fatemeh, Marshall, Nicholas, Haredasht, Fateme Nateghi, Black, Kameron C, Zahedivash, Aydin, Maddali, Manoj V, Ma, Stephen P., Chang, Amy, Deresinski, MD Phar Stanley C, Goldstein, Mary Kane, Asch, Steven M., Banaei, Niaz, Chen, Jonathan H
Blood cultures are often over ordered without clear justification, straining healthcare resources and contributing to inappropriate antibiotic use pressures worsened by the global shortage. In study of 135483 emergency department (ED) blood culture orders, we developed machine learning (ML) models to predict the risk of bacteremia using structured electronic health record (EHR) data and provider notes via a large language model (LLM). The structured models AUC improved from 0.76 to 0.79 with note embeddings and reached 0.81 with added diagnosis codes. Compared to an expert recommendation framework applied by human reviewers and an LLM-based pipeline, our ML approach offered higher specificity without compromising sensitivity. The recommendation framework achieved sensitivity 86%, specificity 57%, while the LLM maintained high sensitivity (96%) but over classified negatives, reducing specificity (16%). These findings demonstrate that ML models integrating structured and unstructured data can outperform consensus recommendations, enhancing diagnostic stewardship beyond existing standards of care.
- North America > United States > California > Santa Clara County > Stanford (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Heterogeneous Causal Discovery of Repeated Undesirable Health Outcomes
Adhikari, Shishir, Muscioni, Guido, Shapiro, Mark, Petrov, Plamen, Zheleva, Elena
Understanding factors triggering or preventing undesirable health outcomes across patient subpopulations is essential for designing targeted interventions. While randomized controlled trials and expert-led patient interviews are standard methods for identifying these factors, they can be time-consuming and infeasible. Causal discovery offers an alternative to conventional approaches by generating cause-and-effect hypotheses from observational data. However, it often relies on strong or untestable assumptions, which can limit its practical application. This work aims to make causal discovery more practical by considering multiple assumptions and identifying heterogeneous effects. We formulate the problem of discovering causes and effect modifiers of an outcome, where effect modifiers are contexts (e.g., age groups) with heterogeneous causal effects. Then, we present a novel, end-to-end framework that incorporates an ensemble of causal discovery algorithms and estimation of heterogeneous effects to discover causes and effect modifiers that trigger or inhibit the outcome. We demonstrate that the ensemble approach improves robustness by enhancing recall of causal factors while maintaining precision. Our study examines the causes of repeat emergency room visits for diabetic patients and hospital readmissions for ICU patients. Our framework generates causal hypotheses consistent with existing literature and can help practitioners identify potential interventions and patient subpopulations to focus on.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction
Ma, Mingyu Derek, Wang, Xiaoxuan, Xiao, Yijia, Cuturrufo, Anthony, Nori, Vijay S, Halperin, Eran, Wang, Wei
Clinical diagnosis prediction models, when provided with a patient's medical history, aim to detect potential diseases early, facilitating timely intervention and improving prognostic outcomes. However, the inherent scarcity of patient data and large disease candidate space often pose challenges in developing satisfactory models for this intricate task. The exploration of leveraging Large Language Models (LLMs) for encapsulating clinical decision processes has been limited. We introduce MERA, a clinical diagnosis prediction model that bridges pertaining natural language knowledge with medical practice. We apply hierarchical contrastive learning on a disease candidate ranking list to alleviate the large decision space issue. With concept memorization through fine-tuning, we bridge the natural language clinical knowledge with medical codes. Experimental results on MIMIC-III and IV datasets show that MERA achieves the state-of-the-art diagnosis prediction performance and dramatically elevates the diagnosis prediction capabilities of generative LMs.
Practical Design and Benchmarking of Generative AI Applications for Surgical Billing and Coding
Rollman, John C., Rogers, Bruce, Zaribafzadeh, Hamed, Buckland, Daniel, Rogers, Ursula, Gagnon, Jennifer, Meireles, Ozanan, Jennings, Lindsay, Bennett, Jim, Nicholson, Jennifer, Lad, Nandan, Cendales, Linda, Seas, Andreas, Martinino, Alessandro, Hwang, E. Shelley, Kirk, Allan D.
Background: Healthcare has many manual processes that can benefit from automation and augmentation with Generative Artificial Intelligence (AI), the medical billing and coding process. However, current foundational Large Language Models (LLMs) perform poorly when tasked with generating accurate International Classification of Diseases, 10th edition, Clinical Modification (ICD-10-CM) and Current Procedural Terminology (CPT) codes. Additionally, there are many security and financial challenges in the application of generative AI to healthcare. We present a strategy for developing generative AI tools in healthcare, specifically for medical billing and coding, that balances accuracy, accessibility, and patient privacy. Methods: We fine tune the PHI-3 Mini and PHI-3 Medium LLMs using institutional data and compare the results against the PHI-3 base model, a PHI-3 RAG application, and GPT-4o. We use the post operative surgical report as input and the patients billing claim the associated ICD-10, CPT, and Modifier codes as the target result. Performance is measured by accuracy of code generation, proportion of invalid codes, and the fidelity of the billing claim format. Results: Both fine-tuned models performed better or as well as GPT-4o. The Phi-3 Medium fine-tuned model showed the best performance (ICD-10 Recall and Precision: 72%, 72%; CPT Recall and Precision: 77%, 79%; Modifier Recall and Precision: 63%, 64%). The Phi-3 Medium fine-tuned model only fabricated 1% of ICD-10 codes and 0.6% of CPT codes generated. Conclusions: Our study shows that a small model that is fine-tuned on domain-specific data for specific tasks using a simple set of open-source tools and minimal technological and monetary requirements performs as well as the larger contemporary consumer models.
- North America > United States > North Carolina > Durham County > Durham (0.05)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Predicting Emergency Department Visits for Patients with Type II Diabetes
Alizadeh, Javad M, Patel, Jay S, Tajeu, Gabriel, Chen, Yuzhou, Hollin, Ilene L, Patel, Mukesh K, Fei, Junchao, Wu, Huanmei
Over 30 million Americans are affected by Type II diabetes (T2D), a treatable condition with significant health risks. This study aims to develop and validate predictive models using machine learning (ML) techniques to estimate emergency department (ED) visits among patients with T2D. Data for these patients was obtained from the HealthShare Exchange (HSX), focusing on demographic details, diagnoses, and vital signs. Our sample contained 34,151 patients diagnosed with T2D which resulted in 703,065 visits overall between 2017 and 2021. A workflow integrated EMR data with SDoH for ML predictions. A total of 87 out of 2,555 features were selected for model construction. Various machine learning algorithms, including CatBoost, Ensemble Learning, K-nearest Neighbors (KNN), Support Vector Classification (SVC), Random Forest, and Extreme Gradient Boosting (XGBoost), were employed with tenfold cross-validation to predict whether a patient is at risk of an ED visit. The ROC curves for Random Forest, XGBoost, Ensemble Learning, CatBoost, KNN, and SVC, were 0.82, 0.82, 0.82, 0.81, 0.72, 0.68, respectively. Ensemble Learning and Random Forest models demonstrated superior predictive performance in terms of discrimination, calibration, and clinical applicability. These models are reliable tools for predicting risk of ED visits among patients with T2D. They can estimate future ED demand and assist clinicians in identifying critical factors associated with ED utilization, enabling early interventions to reduce such visits. The top five important features were age, the difference between visitation gaps, visitation gaps, R10 or abdominal and pelvic pain, and the Index of Concentration at the Extremes (ICE) for income.
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)
Interpretable Hierarchical Attention Network for Medical Condition Identification
Fang, Dongping, Duan, Lian, Yuan, Xiaojing, Klunder, Allyn, Tan, Kevin, Cao, Suiting, Ji, Yeqing, Xu, Mike
Accurate prediction of medical conditions with straight past clinical evidence is a long-sought topic in the medical management and health insurance field. Although great progress has been made with machine learning algorithms, the medical community is still skeptical about the model accuracy and interpretability. This paper presents an innovative hierarchical attention deep learning model to achieve better prediction and clear interpretability that can be easily understood by medical professionals. This paper developed an Interpretable Hierarchical Attention Network (IHAN). IHAN uses a hierarchical attention structure that matches naturally with the medical history data structure and reflects patients encounter (date of service) sequence. The model attention structure consists of 3 levels: (1) attention on the medical code types (diagnosis codes, procedure codes, lab test results, and prescription drugs), (2) attention on the sequential medical encounters within a type, (3) attention on the individual medical codes within an encounter and type. This model is applied to predict the occurrence of stage 3 chronic kidney disease (CKD), using three years medical history of Medicare Advantage (MA) members from an American nationwide health insurance company. The model takes members medical events, both claims and Electronic Medical Records (EMR) data, as input, makes a prediction of stage 3 CKD and calculates contribution from individual events to the predicted outcome.
- Banking & Finance > Insurance (1.00)
- Health & Medicine > Health Care Technology > Medical Record (0.87)
- Health & Medicine > Therapeutic Area > Nephrology (0.57)
- (3 more...)